NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

An R package AZIAD for analysing zero-inflated and zero-altered data

https://doi.org/10.1080/00949655.2023.2207020

Dousti Mousavi, Niloufar; Aldirawi, Hani; Yang, Jie (November 2023, Journal of Statistical Computation and Simulation)

Full Text Available
Categorical Data Analysis for High-Dimensional Sparse Gene Expression Data

https://doi.org/10.3390/biotech12030052

Dousti Mousavi, Niloufar; Aldirawi, Hani; Yang, Jie (September 2023, BioTech)

Categorical data analysis becomes challenging when high-dimensional sparse covariates are involved, which is often the case for omics data. We introduce a statistical procedure based on multinomial logistic regression analysis for such scenarios, including variable screening, model selection, order selection for response categories, and variable selection. We perform our procedure on high-dimensional gene expression data with 801 patients, 2426 genes, and five types of cancerous tumors. As a result, we recommend three finalized models: one with 74 genes achieves extremely low cross-entropy loss and zero predictive error rate based on a five-fold cross-validation; and two other models with 31 and 4 genes, respectively, are recommended for prognostic multi-gene signatures.
more » « less
Full Text Available
Variable Selection for Sparse Data with Applications to Vaginal Microbiome and Gene Expression Data

https://doi.org/10.3390/genes14020403

Dousti Mousavi, Niloufar; Yang, Jie; Aldirawi, Hani (February 2023, Genes)

Sparse data with a high portion of zeros arise in various disciplines. Modeling sparse high-dimensional data is a challenging and growing research area. In this paper, we provide statistical methods and tools for analyzing sparse data in a fairly general and complex context. We utilize two real scientific applications as illustrations, including a longitudinal vaginal microbiome data and a high dimensional gene expression data. We recommend zero-inflated model selections and significance tests to identify the time intervals when the pregnant and non-pregnant groups of women are significantly different in terms of Lactobacillus species. We apply the same techniques to select the best 50 genes out of 2426 sparse gene expression data. The classification based on our selected genes achieves 100% prediction accuracy. Furthermore, the first four principal components based on the selected genes can explain as high as 83% of the model variability.
more » « less
Full Text Available

Search for: All records